PRIS at 2012 TREC Medical Track: Query Expansion, Retrieval and Ranking

نویسندگان

  • Jiayue Zhang
  • Lin Lin
  • Shudang Diao
  • Yukun Li
  • Runnan Liu
  • Weiran Xu
  • Jun Guo
چکیده

1 Data Preprocessing 1.1 XML parsing The official datasets are XML format so we have to parse them before indexing. We choose Lucene as our tool for indexing and searching ,we select the Jakarta-commons-Digester (the following we referred to as digester) to parse the xml documents. The xml document is processed by the Digester to be a java object and then we can get the fields that we would use from the java object .In addition, we also process the tag "report_text" in the xml documents so that we can get the age and sexuality information which are very important fields for searching task. 1.2 Negation Detection People always find some phrases like "did not have head pain" or "there is no pain in your leg"in the medical diagnosis reports .These phrases will make some boring troubles in the medical text retrieval. For example, when we want to find someone who have a headache we may get a report like this This patient is a**AGE[in 50s]-year-old male with a past medical history of multiple transplants including small bowel, liver, and pancreas in 1998 and status post kidney transplant in 2006, presents with fever. The patient states he woke this morning and thought to have fevers and chills. He also has had some vomiting and diarrhea. Denies any belly pain. He states he feels a little short of breath. He denies any chest pain. No sore throat. No headache..... In fact, this patient just has fevers and chills. To solve this problem, we use the famous NegEx algorithm .NegEx [5] algorithm is mostly known to Text Mining researchers for finding terms used in negative senses. While, there is a java class to implement Wendy Chapman's NegEx algorithm. This class' author is Junebae Kye .On the base of this class, we write a program to finish the negation detection work and the result show us that this method takes us a better performance. 2 Indexing Model main component is a search engine based on Apache Lucene. Lucene is a powerful Java library that lets you easily add document retrieval to any application. In recent years Lucene has become exceptionally popular and is now the most widely used information retrieval library We utilized Lucene for indexing purpose. Lucene provided the function to achieve this goal. Documents and fields are Lucene's fundamental units of indexing and searching. A document is Lucene's atomic unit of …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PRIS at TREC 2011 Medical Record Track

Our method to accomplish the Medical Record Track is described in this paper. For ad hoc retrieval, Indri and Xapian are used for indexing, searching, and initial query expansion. The main query expansion is achieved using LSI. The evaluation results show the performance of our system is above the average.

متن کامل

PRIS at TREC 2013 Microblog Track

This paper described the real-time search system we built for TREC 2013 microblog track. We focused on query expansion and ranking algorithm and employed different strategies. For query expansion, we implied pseudo-relevance feedback using WAF algorithms and a refined tf ∗ idf formula. For re-ranking part, our system makes use of various tweets’ features, such as expansion terms, URL informatio...

متن کامل

PRIS in TREC 2008 Blog Track

This paper describes BUPT (pris) participation in baseline adhoc retrieval task and the opinion retrieval task at Blog Track 2008. The system adopts a two-stage strategy in the opinion retrieval task. In the first stage, the system carries out a basic topic relevance retrieval to get the top 1,000 documents for each query. In the second stage, our system combines several Maximum Entropy based c...

متن کامل

SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task

This paper describes the participation of the SNUMedinfo team at the TREC Clinical Decision Support track 2014. This task is about medical casebased retrieval. Case description is used as query text. Per each query, one of three categories (Diagnosis, Test and Treatment) is designated as target information need. Firstly, we used external tagged knowledge-based query expansion method for the rel...

متن کامل

A Study of Faceted Blog Distillation--PRIS at TREC 2009 Blog Track

This paper describes BUPT (pris) participation in faceted blog distillation task at Blog Track 2009. The system adopts a two-stage strategy in faceted blog distillation task. In the first stage, the system carries out a basic topic relevance retrieval to get the top k blogs for each query. In the second stage, different models are designed to judge the facets and ranking.

متن کامل

ECNU at 2015 CDS Track: Two Re-ranking Methods in Medical Information Retrieval

This paper summarizes our work on the TREC 2015 Clinical Decision Support Track. We present a customized learningto-rank algorithm and a query term position based re-ranking model to better satisfy the tasks. We design two learning-to-rank framework: the pointwise loss function based on random forest and the pairwise loss function based on SVM. The position based re-ranking model is composed of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012